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A  STATISTICAL  THEORY  OP  COMPUTER  PROGRAM  TESTING 


Arthur  E.  Laemmel 
Abstract 


Most  computer  programs  are  tested  with  some  of  the  possible  sets  of 
input  data,  but  few  can  be  tested  with  all  possible  input  data.  Passing 
such  a  partial  test  cannot  insure  that  the  program  will  always  function 
correctly;  we  can  perhaps  say  that  the  probability  of  failure  is  less  than  a 
specified  amount.  It  is  the  purpose  of  this  report  to  derive  several  for¬ 
mulas  for  Lhe  above  probability.  Aiso,  in  some  cases  an  optimum  testing 
strategy  can  be  derived  which  minimizes  that  probability  of  failure. 


1.0  Intrudin' 'Mon 


There  are  three  methods  lor  maximizing  the  reliability  of  computer' 
programs:  (1)  use  a  systematic  procedure  which  makes  it  difficult  for 
errors  to  occur  during  the  writing  of  the  program,  (2)  prove  that  the 
program  works  correctly  by  some  formal  or  automatic  process,  and  (3)  Lest 
and  debug  the  program  thoroughly  before  passing  il  on  to  tne  user.  Most 
programmers  will  use  some  combination  of  these  methods,  and  in  fact,  some 
procedures  involve  element:,  of  more  than  one  method.  The  present  report 
emphasizes  testing,  but  tirst  some  remarks  wiii  be  made  about  program 
writing  and  proving. 

(1)  Writing  Correct  Programs:  While  no  mie  intentionally  inserts 
errors  In  Tiis  program,  if  is  undoubtedly  true  that  many 
people  would  produce  more  reliable  programs  wi:n  less  effort 
if  they  were  taught  better  programming  techniques.  How¬ 
ever,  it  seems  obvious  that  the  average  programmer  should 
test,  his  programs  even  if  ne  exercises  the  maximum  oi  tare 
and  uses  tin  best  techniques. 

v  11  j  P:  oving  program  Correctness:  I' here  are  several  reasons 

why  a  formal  procedure  for  proving  program  correctness 
cannot  be  relied  on  to  insure  absence  of  errors  in  practical 
situations:  (i)  a  uniform  algorithm  for  proving  the  cor¬ 

rectness  of  an  arbitrary  program  can  be  shown  to  be  im¬ 
possible,  being  essentially  equivalent  to  Turing's  halting 
problem;  (ii)  even  tor  so.vahle  sub-cc i.vvs  of  the  correct¬ 
ness-proving  problem,  the  usual  method  (some  improvement 
on  Herbrand  search)  is  so  time  consuming  as  to  be  imprac¬ 
tical;  (iii)  there  is  always  a  possibility  of  error  in  the 
proving  program,  or  in  applving  it.  to  the  program  being 
tested. 

(3.)  jesting  Computer  Programs:  In  view  of  the  difficulties  ol 

validating  a  computer  program  by  programming  techniques  or 
formal  proof  methods,  it  is  believed  that  some  amount  of 
testing  will  always  be  necessar  the  purpose  of  'his 
report  is  to  describe  a  model  whits  shows  the  relationship 
between  errors  of  different  type.,  and  the  prooability  that 
they  will  cause  a  program  to  fail,  and  also  to  suggest  op¬ 
timum  testing  methods  which  mimmi/t  the  probability  of 
program  failure. 

Throughout  this  report  it  is  assumed  that  a  rmnpuiei  program  can  be 
tested  and  that  it.  will  either  pass  or  tail  the  lest  It  is  also  assumed  that 
a  tester  and  a  user  will  interpret  failure  ol  the  i-i  .gram  in  exactly  the  same 
way.  For  example,  failure  might  mean  one  ol  t hi  loii.,winq: 

i)  the  program  "bombs"  completely. 

!■  .  an  obviouslv  wiv.rn|  answer  is  gives 

a  ’ll. lV.be!  ,.  iiiiCiUIato  111  t'le  li  e'  j.  !  .,!,!  bi-.i,' 


iv)  the  numbers  are  correct,  but  the  format  is  wrong. 

v)  the  program  works  perfectly,  but  a  side  effect  causes  failure 
in  another  program. 

It  must  also  be  decided  whether  the  program  alone  is  being  tested,  or 
whether  the  program  and  algorithm  is  being  tested.  This  report  is  directed 
mainly  to  the  latter,  but  the  results  can  be  suitably  interpreted  so  as  to 
apply  to  the  former. 

The  extent  to  which  a  computer  program  (possibly  including  the  under¬ 
lying  algorithm)  can  be  tested  varies  from  application  to  application  and  is 
usually  not  a  yes  -  no  situation.  Whether  or  not  a  statistical  model  applies 
to  a  particular  case  depends  very  strongly  on  the  tester's  knowledge  of  just 
what  answer  should  be  produced  by  the  program.  Some  of  the  many  possi¬ 
bilities  of  the  user's  prior  knowledge  of  the  correct  answer  are: 

1.  The  exact  answer  is  known  beforehand.  An  example  would 
be  a  math  package  for  a  new  computer.  Accurate  tables  of 
cos,  arctan,  log,  etc.  have  beer  available  for  many  years. 

2.  A  proposed  answer  can  be  checked  to  see  if  it  is  a  correct 
answer,  but  the  correct  answer  is  not  known  beforehand. 
Programs  which  calculate  the  roots  of  transcendental  equa¬ 
tions  are  examples  of  this  category. 

3.  Answers  for  certain  special  input  values  are  known  before¬ 
hand.  This  is  very  common.  For  example,  if  a  program  is 
supposed  to  output  the  capacitance  of  an  arbitrary  two 
conductor  transmission  line,  it  would  be  natural  to  test  it  for 
coaxial  cylinders. 

4.  The  answer  is  not  known  beforehand,  nor  can  it  be  accur¬ 
ately  checked  for  any  combination  of  input  values.  How- 
ever,  certain  consistency  relations  among  different  outputs 
are  known.  For  example,  it  may  be  obvious  that  the  pro¬ 
gram  should  generate  an  output  which  is  a  monotonically 
increasing  function  of  the  input.  A  test  which  detects  a 
decrease  indicates  an  incorrect  program,  but  no  amount  of 
testing  can  indicate  a  correct  program. 

5.  Absolutely  nothing  is  known  about  the  answers  which  should 
be  produced.  This  is  probably  very  rare.  Even  here,  the 
theory  presented  in  this  report  applies  if  the  program 
"bombs,"  i.e.,  a  fatal  or  non-fatal  run-time  error  message  is 
produced . 

The  identification  of  an  error  is  often  not  unique.  This  is  illustrated 
by  the  fact  that  error  messages  from  a  compiler  or  run-time  monitor  can 
direct  the  user  away  from  what  he  considers  as  the  error.  For  example,  if 
the  compiler  says  "line  15,  operator  missing"  the  error  might  really  be  "line 
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11,  s i  i ! linen i  terminator  missing."  m  many  cases  the  syntax  can  be  cor¬ 
rected  in  several  places  to  get  the  program  thru  the  compile  stage,  but  one 
place  usually  contains  the  reported  error,  and  another  place  is  usually 
where  the  error  should  be  corrected.  Similar  comments  can  be  made  about 
semantic  errors  detected  at  run  time.  In  some  cases  an  error  might  equally 
well  be  corrected  in  one  of  several  places,  for  example,  a  common  PL  1 
error  might  be  corrected  by  either  declaring  K  to  be  a  FLOAT  number,  by 
using  XK  instead,  or  by  lor  ring  a  necessary  type  conversion.  This  type  ot 
ambiguity  in  defining  the  error  is  not  believed  to  cause  any  difficulties  with 
the  formulas  developed  in  this  report,  provided  the  parameters  are  in¬ 
terpreted  correctly. 

2.0  Definitions  and  Objectives 

2.1  Definitions  Some  basic  aspects  ot  the  testing  process  apply  equal¬ 
ly  well  to  a  computer  program  or  to  a  physical  device.  Lor  this  reason  the 
program  or  device  being  tested  will  sometimes  be  called  the  module.  After 
Lhe  muO'le  has  been  constructed ,  it  is  checked  by  a  testei  and  then  em¬ 
ploye.;  :  y  a  user.  !  Tu  piobability  of  error,  \\,  which  occurs  during  use 
is  given  by  ( 


where  P  is  the  probability  that  the  tester  misses  all  of  the  residual  bugs 

in  the  module,  and  P  is  the  probability  that  the  user  then  encounters  one 

of  file  overlooked  bugs.  If  exhaust  iv,  testing  is  possible  then  P  =  0, 

since  it  is  assumed  that  it  a  bug  is  lound  it  then  is  corrected  and  the 
whole  process  is  started  again.  In  most  cases  of  interest  exhaustive  testing 
is  not  possible  or  practical.  If  there  are  no  bugs  then  all  of  the  probabili¬ 
ties  are  xero  and  this  possibility  appears  as  a  special  case  in  the  analysis 
t.o  follow.  Some  alternate  definitions  of  "probability  ot  error"  are  given  in 
Section 


objectives  before  presenting  l h«*  !•  !  uled  math  etna  a  .1  results, 

ihe  objectives  of  this  work  will  be  stated  m  >i  e  expin  illy.  Kouqhlv ,  the 
aims  are  t.o  provide  an  estimate  of  Lhe  number  of  tests  required  to  say 
that  the  program  being  tested  is  correct  with  a  given  probability,  and  to 
provide  a  strategy  so  that  a  given  number  ot  tests  am-  chosen  most  etfer- 
l  i  v  o  1  •  . 

the  first  objective  is  met  by  deriving  a  turn  canal  it  i  ■  t  An. -hip  between 
die  number  of  tests  and  the  probability  ot  ei  roi  h.  is  i.v  s  veral  pos- 
. 'bilities  for  defining  probability  of  errors,  the  .m  uses  n.  re  might  be 
.  died  the  "probability  of  embarrassment"  for  he  u  .  tester  is  not 

.■misi:  i  •  d  if 


he  discovers  a  bug  and  returns  the  program  to  the  writer 
he  approves  the  program  in  spite  of  its  having  one  or  more 
bugs,  but  the  user  does  not  encounter  one  of  the  remaining 
bugs.  The  tester  is  embarrassed  if 

he  approves  the  program  and  then  the  user  encounters  an 
undetected  bug.  If  the  model  is  designed  appropriately ,  the 
probability  of  error  decreases  as  the  number  of  tests  is  in¬ 
creased  according  to  Eg.  30  below. 


The  second  objective  is  met  by  choosing  those  tests  which  minimize  the 
expression  derived  for  probability  of  error  (defined  as  above)  while  keeping 
the  number  of  tests  fixed.  Specifically,  the  principal  question  which  must 
be  answered  here  is:  should  testing  be  concentrated  on  input  data  combi¬ 
nations  which  are  most  likely  to  be  chosen  by  the  user,  or  input  data 
combinations  with  a  largo  a  priori  probability  of  causing  program  failure? 
The  answer  is  that  tests  should  be  applied  to  both  of  these  combinations  of 
input  data  in  a  ratio  which  can  be  calculated  from  Eq.  33  below. 

Actually,  several  simpler  models  are  also  analyzed  before  that  des¬ 
cribed  by  Eqs.  30  and  33.  All  of  these*  expressions  contain  many  para¬ 
meters,  and  curves  are  plotted  for  selected  values.  No  real  data  were  used 
for  the  parameters,  but  this  should  certainly  be  done  in  the  future. 

3.0  Models 


3.1  Elementary  Mode]  A  simple  cast-  might  be  the  following:  The 
module"  has ' ft  'possible  input  values,  and  each  of  these  are  equally  likely  to 
be  chosen  by  the  tester  or  by  the  user.  Of  these  input  values,  VV  cause 
improper  functioning  of  the  module,  but  neither  the  tester  nor  the  user 
knows  which  inputs  cause  errors  or  even  how  large  W  is.  The  tester 
chooses  t  inputs  at  random  without  keeping  a  record  oi  inputs  previously 
tested,  i.e.,  sampling  with  replacement.  Under  these  circumstances  P  = 


improper  functioning  of  the 
knows  which  inputs  cause 


Under  these  circumstances  Pu  = 


=  (1  -  W/N ) 


P  =W(l-VV 

1  e  N  1  N 


-VVt/N 


A  plot  of  P  vs.  VV  is  displayed  in  llg.  1.  If  the  testing  is  to  do  any 
good,  i.e.,  to  reduce  P.  significantly  less  than  W/N,  then  it  is  necessary 

>  i  * 


In  many  applications  it  is  found  that  satisfying  the  inequality  of  Eq.  (3) 
requires  a  very  large  number  of  tests  t,  and  that  our  intuitive  feeling  is 
that  is  acceptably  small  in  spite  of  testing  using  far  fewer  tests  than 


ERROR  PROBABILITY 


W 


NUMBER  OF  INPUT 
VALUES 

NUMBER  OF  INPUT 
VALUES  CAUSING 
MA  LFULCTIONS 

NUMBER  Of  INPUT'S 
TESTED 

WORST  VALUE  01  V 


ITUUKi:  1.  PLOT  OF  P  VS.  W  FROM  THE  MODEL  OF  EQUATION  2. 


It  will  continue  to  be  assumed  that  the  tester  passes  no  information  to 
the  user  about  which  inputs  were  tested.  It  will  also  be  assumed  that  each 
test  either  succeeds  or  tails  and  that  there  is  no  additional  information 
which  would  permit  sequential  sampling  methods.  Sampling  without  replace¬ 
ment  would  slightly  lower  P  to 


(N-WVJN-OI 
(N-W-t  j‘!  N!  “ 


t  <  N-W 


t  >  N-W 


Vi) 


Note  that  this  method  requires  that  the  te.  t. a  keep  a  red  ?  d  c.f  inputs 
already  tested,  or  that  he  avoids  duplication  by  other  means. 

The  particular  type  of  error  to  which  P ;  pertains  must  be  borne  in 

mind  to  avoid  confusion  with  other  possible,  definitions  of  error.  !f  W=N  all 
inputs  to  the  module  cause  malfunctions  but  P  =  0  according  to  Equations 

1 2)  and  (1).  T  his  is  true  because  the  tester  removes  user  buys  with  each 
test,  and  continues  until  all  tests  are  exhausted.  In  this  case  the  tester 
always  rejects  the  module  (provided  only  that  t  0)  and  so  the  user  c  innoi 
experience  a  malfunction.  Note  that  P  is  neither  the  probability  J  user  has 

a  mail  unction!  nor  the  probability  \  user  has  a  m-dlum  tion  i  '•<  mu'  accepts 
module  Rather,  P  is  the  probability  tus.n  t-  ,  ,  maltue  a,  ,  and  tester 

arm  m  dulej.  It  (he  number  ol  ,r;ts  is  1 1  s  ■  .n  ,  and  if  L  ting  with 
•  •  |  :  t  is  done,  then  the  value  U  .  •  i,  .  •.  •  a-.  !'  ' 


(5) 


from  Cq.  (5)  or  Fig.  1,  it  can  be  seen  that  as  W  *  0  then  P  -»  0  also. 
This  is  so  because  for  small  W  there  is  a  small  chance  that  the  user  will 

encounter  a  faulty  input.  Similarly  as  W  ->  N  then  l5.  -0  also  because 

t  hppe  is  ^ 

'  k  little  chance  that  the  tester  will  accept  the  module.  01  course,  the 

last  case  is  undesirable  for  reasons  other  than  the  value  ot  ,  i.e.,  the 
user  has  a  small  probability  of  receiving  a  released  program. 

3.2.  Model  with  Unequal  Probabilities  The  model  described  in  the 
preceding  section  is  too  simple  to  apply  to  most  practical  testing  situations: 
some  inputs  are  more  likely  to  fail  than  others,  and  the  probability  of  one 
input  failing  may  not  be  independent  of  another  input  tailing.  Often  a 
single  bug  may  cause  many  inputs  to  fail.  The  tester  may  not  choose  the 
inputs  to  be  tested  randomly,  but  rather  in  such  a  way  as  to  utilize-  his 
knowledge  of  the  prior  failure  probabilities.  The  user  may  not  be  tree  to 
choose  more  reliable  inputs;  in  fact,  he  may  be  constrained  by  the  problem 
to  use  less  reliable  inputs.  The  model  to  be  described  here  includes  three 
events,  the  last  two  being  independent  of  each  other  but  dependent  on  the 
first. 

(1 )  A  programmer  constructs  a  module  which  has  an  error  pat¬ 
tern  a  with  probability  P(a).  or  might  be  a  binary  vector 
(«1 ,  a2,...,  i*N)  with  or  =  1  meaning  input  i  fails  and  or  =  0 

meaning  jnpUt  j  functions  correctly. 

(2)  A  tester  tries  certain  inputs  to  the  modulo  and  accepts  the 
module  with  probability  R(accept  |  a).  The  tester  passes  no 
information  concerning  which  inputs  were  tested  to  the  user. 

(3)  A  user  selects  one  of  the  inputs  and  the  module  tails  with 
probability  Q( fail  |  a).  The  error  probability  defined  pre¬ 
viously  is  now  given  by 

P  =  1  P(«)  Q(fail  |  o)  Rfaccept  |  a)  ((>) 

as  A 


where  A  is  the  set  of  all  possible  error  pdoins. 

Co  illustrate,  if  there  are  N  input  v.jjues  then  A  consists  ol  2V  ele¬ 
ments.  Assume  P(«)  is  0  for  all  A  except  lot  the  first  w: 

«W  =  (U . 1,  0,0 _ 0)  (ba- 

'  W  *  i\  -  vv  ■> 
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and  let  the  probability  of  the  user  selecting  input  i  be  q..  Then 


Q(  Fail  |  (Vk; )  =  1  q. 

U  i=l  1 

Assume  the  tester  selects  his  inputs  randomly,  choosing  input  i  with  proba¬ 
bility  r-.*  Then 


R(accept  |  a.  ,)  =  II  (1  -  r-) 


to'  W 

Fe  = <■  *  V . "  a  -  rj> 

1=1  |=1  1 


If  -  1/M  and  r  =  l/N  this  reduces  (approximately)  to  the  elementary 
model  given  above: 

Pe=N:<l-^W  <«> 

Another,  more  useful,  form  for  P(<;.)  than  bq.  (6a)  is  obtained  by  assuming 
that  the  i  input  malfunctions  with  probability  p..  Then 

N  of.  1  -of- 

p(a)  =  n  p.1  (1  -  p.)  1 
i=l  1  1 


Q( fail  |  a)  =  I  «.q. 

i=i  J  1 


R(a  |  a)  =  II  (1  -  r-) 
i=l  J 


The  formula  also  applies  if  rr  is  given  the  interpretation  "input  i  is  tested 

and  the  response  is  noted  to  be  wrong  by  the  tester."  Specifically  js  =  0 

might  mean  either  that,  input  i  was  not  tested,  or  that  it  was  tested  and 
an  error  was  not  detected. 


The  two  products  can  be  combined  in  evaluating  P  : 


N  N 

P  =  l  1  ...  1  l  w-q-  II  V.ia) 

«1  « 2  aN  *=1  i_l 

where 

Fj^)  =  Pp  (l_Pi)1"°fi  (l-r/i 
This  reduces  to 


N  N 

P  =  n  (l-Pjr.)  I  q. 
c  j=i  1  1  i=i  1 


Pi0-^ 

l-Piri 


(9) 


If  the  tester  selects  t  inputs  deterministicaiiy ,  and  if  the  inputs  are  per¬ 
muted  so  that  these  occur  first,  then 


t  N 

p,  =  n  n-pj)  i  P;Qj  u(,> 

L  j=l  1  i=t+l  1  1 


An  optimum  testing  strategy  is  obtained  if  the  inputs  are  permuted  so  that 
the  expression  is  minimized.  The  first  factor  suggests  testing  inputs  with 
the  largest  pj(  but  the  second  factor'  suggests  testing  inputs  with  the 

largest  p^q^ .  Let  the  above  expression  be  abbreviated  as  Pe  =  PmPu  and 

note  that  Pu  is  the  probability  of  the  user  getting  an  error  on  an  untested 

input.  Consider  the  effect  of  adding  one  more  input  to  the  test. 

Let 


Fe  =  Pm(1'Pk><Pu-PkV 
(qk+Pu)pk 

K  -  p  u-  p  -  ) 


thus,  the  criterion  is  to  select  the  input  with  the  largest 


(II, 

a  weighted  compromise  bet  we/  r  selection 

3.3  Model  with  Statistical  Dependence  Computer  programs  usually 
fail  for  a  whole  set.  of  input  values  as  a  result  of  a  single  bug.  It  is  more 


“Jk +  'VPk 

As  can  be  seen,  this  provides 
the  basis  of  p.  or  q.  p.  alone. 


r 


realistic  to  assume  that  failures  due  to  different  bugs  are  statistically 
independent  rather  than  failures  for  different  input  values.  For  example,  a 
single  oversight  might  cause  a  square  root  program  to  fail  for  all  negative 
numbers.  Let 


1  if  the  j 


•  th 


bug  causes  failure  for  input  i 


0  if  the  | 


th 


bug  doesn't  affect  input,  i 


x  L 

If  fb  is  the  probability  of  the  j  bug,  if  M  is  the  number  of  possible  bugs, 
and  if  0j  (j  =  1,2,...,M)  is  the  pattern  of  actual  bugs,  then  (note  defini¬ 
tions  of  P,  Q,  R  in  the  probability  of  event  statements  (1),  (2),  and  (3)  of 
Section  3.2) 

P  =  I  P(0)  Q(f  |  0)  R(u  I  <•)  (12) 

e  0 

This  is  analogous  to  the  corresponding  formula  in  a  given  above.  Here, 
M  0.  1-0. 

P(0)  =  dp,1  (l-ff.)  1 

i=l  1  ' 

N  M 

Q(f  I  6)  =  Z  q : 1 1  -  n  (l-o.,0,)|  (13) 

j=l  1  k=l  )K  K 


R(<v  I  0)  =  II  n  (l-rc) 
i=l  £=1 


Combining  and  rearranging  gives 

N  M 

P  =  2  q  j  1  A.(0,  ,  0?,...,  0,.)  n  Fj(fl. ) 
e  j=1  l  q  )  1  2  M  j=]  ;  i 

where 


i  a  N 

ri(6i)  =  Pi  d-Pj)  “  ^  0-r() 


and 


M 

Aj<6l .  8M>  =  1  -  £ 

The  summations  over  6.  are  for  only  two  values  (0,1)  and  can  be  carried 
out  to  give 

M  N  a£i  N  M  N  °£i 

p  =n  [l-p.+Pj  n  (l-r  )  ]-i  q.  n  [l-p.+a-o.op.  n  (l-rj  ]  (14) 

e  i=l  1  £=1  £  pi  Ji=l  1  ]l  1  £=1  * 


If  deterministic  testing  is  used  (r=0,l)  over  the  first  t  inputs: 
M  N  M 

p  =  rr  o-p.)  1  q;  11-  n"  d-a.-p.)] 

e  i=l  1  j=t  l  M  111 


Where  IT  refers  to  a  product  over  terms  involving  a  value  of  i  such  that 
o..  =  1  for  at  least  on  ■  value  ol  j  in  the  range  j  -  l,2,...,t,  and  where  n" 

refers  to  the  other  values  of  i.  The  bugs  can  be  permuted  so  that  1  <  i  < 
T  implies  =  1  for  at  least  one  value  of  j  from  1  to  t  and  x  <  i  <  M  implies 

Ojj  =  0  for  all  values  of  j  from  1  to  t.  Then 
Mi  MM 

n*  =  n  and  rr  =  n 
i=l  i=l  i=l  i=t+l 

The  problem  is  then  to  minimize 

t  N  M 

P  =  n  (l-p.)  I  q.  [1-  n  (l-a..p.)  (15) 

f  i=l  1  j=t+l  1  i=T+l  1 


A  graphical  interpretation  of  the  above  is  portrayed  in  Fig.  2,  and 
illustrates  how  different  inputs  excite  various  bugs.  For  example,  input  2 
excites  bugs  numbered  6,7,8  and  9.  Bug  numbered  8  can  only  be  dis¬ 
covered  through  the  application  of  inputs  2  or  4. 

A  clearer  picture  of  how  P£  depends  on  the  amount  of  testing,  t,  can 

be  obtained  by  deriving  an  upper  bound  from  Eq.  (15).  For  most  cases  of 
interest  this  upper  bound  will  be  tight,  i.e.,  a  good  approximation.  Multi¬ 
ply  thru  by  the  first  product  in  Eq.  (15): 
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(16) 


N  T  M 

Pe  =  I  »  O-ty  -  n  d-VjiPj;  J 

j=t+i  1  [=1  i=i  ' 


where 


a-.  r  <  i  <  M 
|i 


Now  observe  Ihe  following: 

M  M 

-  n  (1-\..|U  -  II  (l-Pj)  since  y--  <  1 

i=l  1  1  i=l  J 


k  q.  -  Q(t.)  <  1  (note  Q(t)  =  1  -  £  ~  1  if  q.  =  i  ) 
j=t+l  1  J 


T 

-  I 

i  i=l 

ri  (l-Pj)  <  e 
i=l  1 


Combining  these: 


r 

-  I 
i=l 


PQ  <  Pa  = 
e  -  e 


-  R_ 


where 


R 


o 


M 

n  (i-p.) 

i=l  1 


M 

•  ii 

~  e 


(17) 


From  the  way  the  o-  matrix  was  permuted,  it  can  be  seen  that  1  ->  M  (mono 
tonically),  and  from  Eq.  (16)  these,  in  turn,  imply  Pg  -»  0  (monotonically ) 
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It  Eq.  (17)  is  a  good  approximation  to  Eq.  (16),  then  the  rate  at 
which  Pp  -*  0  can  be  seen  to  he  exponential. 

3.4.  :!1  jsrratiw  Special  Case  A  simple  special  case  will  illustrate 

toe  above  formulas,  and  is  useful  in  getting  a  rough  idea  of  the  number  of 
vsts  and  the  probability  of  error.  Suppose  that  each  of  the  M  buqs  occurs 
with  the  same  probability  of  error,  {$,  and  affects  the  same  number  of  input 
voliK  .  b.  Suppose  further,  that  the  pattern  of  input  errors  is  the  most 
difficult  to  detect  with  the  given  number  of  tests,  t,  i.e  ,  'hat  the  error 
subse's  are  as  ''disjoint"  as  possible.  It  the  tests  are  distributed  most 
effectively  ovet  the  inputs,  each  test  oil]  delta  Mb/ N  bugs.  The  testing 
will  result  in  M  -  Mbt.  M  un Jet^cted  bugs,  and  t  =  Mbl/N  (assuming 
bt  •'  \M.  Thus 

fiViLt 

_  ~  "  -ftM 


-\s  :  *  N/h  tt. is  .r.o.vs  ip  0.  oi ...  *r  ....  rmh?  1  oiiuii  comp*  _d  ... 

"due  ••  if*'  r> :>  testing  ' .  must  be  tr  i  ]<•  a>m parable  '0  >.  .  . 
more  general  assumptions  concerning  the  parameters,  a:  .'  wit n  a  • 
choice  of  tests,  t  might  be  a  much  smallei  traction  of  N . 

3.u  Optimum  Testing  Strategy  (in  Section  3.3)  For  a  given  t,  ihat 
selection  of  inputs  id  "be  tested  which  minimizes  P  will  be  defined  as  the 

optimum  testing  strategy.  from  Fq.  (15),  and  intuition,  one  might  tie 
tempted  to  test  the  t  inputs  most  likely  to  be  chosen  by  the  user,  tvs-* 
minimizing  the  factor  Iq.  However,  for  most  cases  of  interest,  this  ap¬ 
proach  would  be  futile.  If  the  q  are  even  very  rouqhly 

j  ivy  eqUa| ,  t. h t  numhet 

•  I  possible  inputs  N  is  so  huge  that  it  would  bp  impossible  to  test  enouoh 
of  inem  to  reduce 

N 


r  gnificantly  below  unity.  Essentially ,  the  q.  only  influence  the  testing 

Orat.-qv  thru  the  fact  that  the  square  bracketted  term  in  Eq.  (15)  depends 
'  }  m 


The  first  term  in  Eq.  (15)  is  the  most  important  for  values  of  Hie 
parameters  of  most  interest,  and  as  was  shown  above,  it  is  minimized  vap- 
p’oximalcly.)  by  choosing  test  values  to  maximize 


II  the  |‘-  ,uv  equal,  P  is  then  minimized  (approximately)  by  maximizing  r. 

This  is  essentially  the  covering  problem  of  switching  theory  and  operations 

research.1  A  set  of  inputs  i  s  I  is  said  to  cover  a  set  of  bugs  j  e  J  if 

(t. |  =  1  for  every  j  in  J  for  at  least  one  i  in  I.  Referring  to  Tig.  2,  input 

4  covers  bugs  3,  4,  5,  6,  7  and  8;  and  input  pair  1,4  covers  bugs  3,  4, 
5,  6,  7,  8,  10,  II,  12  and  13.  These  inputs  are  optimum  in  that  they 
cover  the  mosi  bugs  possible.  Thus,  using  these  optimum  choices,  t  =  6 
and  10  for  !  *  I  md  2.  However,  the  optimum  set  of  3  inputs  is  not 
obtained  by  adding  that  input  which  would  cover  the  most  additional  bugs. 
Inputs  1,  3,  4  cover  only  12  bugs,  but  inputs  1,  2,  3  cover  13  bugs.  The 
covering  problem  is  usually  stated  as  minimizing  the  number  of  tests  re¬ 
quired  to  cover  all  bugs  (in  the  present  terminology),  and  no  general 
algorithm  lor  its  solution  is  known.  The  process  described  above,  repeat¬ 
edly  adding  that  test  which  causes  the  greatest  increment  in  bugs  covered, 
is  usually  referred  to  as  the  "heaviest-first  algorithm." 

A  sketch  of  t  vs.  t  for  the  present  example  is  shown  in  Fig.  3. 

Note  that  the  number  of  covered  bugs  added  at  each  step  is  a  de¬ 
creasing  function  of  t.  If  t  is  very  large,  the  bugs  being  covered  might 
be  those  affecting  only  a  single  input,  such  as  divide-by-zero  bugs. 

The  rate  of  increase  of  Pg  with  increasing  t  is  certainly  of  great  inter¬ 
est,  and  one  simple  functional  dependence  can  be  given  using  the  above 
ideas.  Define  r^,  t ^ ,  •••  as  those  values  corresponding  to  t=l ,  2 , . . .  when 

optimum  or  near-optimum  testing  strategy  is  used.  Eq.  (19a)  can  be  re¬ 
written 

Tt  1 

E.  =  P-  =  1  B„  v  20) 

L  i=l  1  k=l  K 

where 


B„  =  I 

K  ’k-r1 


Pi- 


T0  =  0 


From  Eq.  (15)  with  the  second  and  third  terms  approximated  by  unity,  or 
from  Eq.  (17)  with  RQ  approximated  by  zero: 


(21) 


! 

I 

k-- 

PeU)  '  -■ 

Now  if  is  assumed  Lo  have  a  form  which  is  a  decreasing  function  of  k 

and  which  can  be  finitely  summed  in  closed  form,  a  convenient  relation  can 
be  found.  I'or  example,  if  =  c./k 


1 

I  4  ~  0.3772 r  +  c£nt 
k=l  K 


This  shows  what  might  be  expected  in  practice,  a  gradual  decrease  of 

over  many  decades  as  t  is  increased  over  several  decades.  When  some 
relations  are  plotted,  eg.  Pq.  (2)  or  Cq.  (18),  they  exhibit  very  sudden 
and  deep  drops  in  P(i  when  certain  values  of  t  are  approached,  and  this  is 

not  the  behavior  to  be  expected  except  in  the  simplest  programs.  Of 

course,  Cq.  (22)  does  not  have  enough  parameters,  but  similar  formulas  can 

be  found  which  do  have  enough  parameters. 

3.6 _ interpretation  of  a  Program  Bug  There  are  many  possible  ways 

to  interpret  a  bug. 

1.  Formally,  a  bug  is  simply  a  subset  of  input  values  which  fail 

2 

together  when  the  program  is  used,  e.g.,  the  region  B  -  4  AC  <  0  in 
solving  a  quadratic. 

2.  A  bug  might  be  a  careless  typing  error,  such  as  A+B  instead  of 

A-B. 

3.  A  bug  might  be  an  incorrect  statement  or  a  defective  subroutine. 

4.  A  more  flexible  definition  of  bug  should  allow  such  Hungs  as 
omitting  a  check  for  dividing  by  zero.  These  bugs  of  omission  ar^  harder 
to  handle  with  fixed  M  and  matrix  a^. 

5.  A  bug  might  be  defined  as  a  distinct  path  thru  a  flowchart.  A 
popular  testing  procedure  is  to  run  at  least  once  thru  every  such  path, 

and  if  these  are  the  only  bugs  included,  this  constitutes  a  complete  cover 
and  Pg  =  0.  Unfortunately,  the  model  would  then  not  be  very  realistic 

because  Pe  would  never  actually  vanish  except  in  the  most  trivial  programs. 
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6.  If  programming  can  be  identified  with  decisions,  a  bug  is  a  wrong 
decision. 


4.0  Random  Testing 

4^1  Random  Test  Models  Some  of  the  difficulties  involved  in  relating 
the  number  of  bugs  discovered  and  the  number  of  tests  made  can  be  avoid¬ 
ed  if  the  tests  are  chosen  randomly  rather  than  deterministically.  Optimum 
strategies  can  still  be  found  if  the  probabilities  of  choosing  the  inputs  to  be 
tested  are  not  equal.  The  probability  of  error  Pg  for  random  testing  has 

already  been  found  in  Eq.  (14)  above.  Let  s£  be  the  probability  that  the 

tester  chooses  the  £'th  input  at  any  particular  test.  If  the  number  of  tests 
is  t,  then  the  probability  that  the  £'th  input  is  tested  during  the  series  of 
t  tests  is 

r£  =  1  -  (1  -  s^)1  (23 ) 

Note  that  the  sum  to  unity,  but  that  each  r^  can  range,  from  zero  to 
one.  It  is  helpful  to  define 


N 

n 

£=1 


(l-Sg) 


Hi 


(24) 


which  is  the  approximate  probability  of  a  single  test  missing  the  i’th  Dug. 
Note  that  Aj  is  certainly  no  greater  than  unity.  Eq.  (14)  can  now  be 

rewritten  in  terms  of  s,  A  and  t. 

M  t  N  M  o.fop} 

Pe(t)  -  n  (l-B.+p.A>)  1  q.  (1-  T1  (1  -  J-iJL  )]  (25) 

i=l  j=l  J  i=l  1-Pj+PjAj1 

As  the  number  of  tests  is  increased  the  probability  of  error  approaches 
zero 


Lim  P  (t)  =  0 
t-*»  e 

If  no  tests  are  made 

N  M 

P  (0)  =  1  -  1  q.  n  (l-o..p.)  (26) 

C  j=l  1  i=l  jl  1 


It  is  easier  to  see  the  importance  of  different  terms  in  Eq.  (25)  by  examin¬ 
ing  the  asymptotic  form  for  large  t.: 


(27) 


where 

is  the  number  of  bugs  with  a  probability  of  discovery 
(These  bugs  which  have  probility  will  be  called  a  "block".) 

and 

p^  is  the  average  probability  of  bugs  in  the  k'th  block  being 
introduced  by  the  program  writer. 

No  loss  of  generality  is  obtained  if  the  are  arranged  in  order  of 

decreasing  magnitude.  Eq.  (30)  clearly  represents  a  decreasing  function  of 
t,  but  it  is  not.  so  apparent  that  the  decrease  is  more  than  1/t  than  expo¬ 
nential  over  the  range  of  interest.  To  see  this,  note  that  each  term  in 
Eq.  (30)  reaches  a  maximum  as  a  function  of  at  =  1/t.  For  each 
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t  a  certain  term  in  the  sum  will  be  dominant  and  values  of  k  both  larger 
and  smaller  than  this  dominant  term  will  contribute  relatively  small  amounts. 
This  leads  to  an  approximate  formula 


Pe(V 


PA 

V 


where 


( r-'i ) 


t 


k 


1 


If  the  product  is  approximately  independent  of  k,  then  Pg  will  de¬ 

crease  approximately  as  1/t,  .  The  decrease  is  exponential  below  t=l/Q  .  . 

k  min 

but  this  value  could  be  very  large  indeed,  For  example,  if  the  input  is  a 
32  bit  number,  and  if  a  bug  causes  an  error  for  a  single  input  value,  then 


Such  a  bug  would  be  very  hard  to  detect  by  random  testing,  but  by 
the  same  token  it  would  be  very  unlikely  to  cause  a  user  error  unless  the 
user  favors  that  value  for  some  reason.  The  behavior  of  Eqs.  (30.)  and 
(31)  is  sketched  in  Fig.  4  and  Fig.  5. 


Fig.  5  shows  the  way  in  which  the  probability  of  error  decreases 

with  the  number  of  tests  t.  The  parameters  were  chosen  to  illustrate  the 
following  cases: 


i)  Pe  decreases  slowly  at  first,  tfien  rapidly 

ii)  Pg  decreases  gradually  throughout  the  range 

iii)  P  decreases  rapidly  with  the  first  few  tests,  then  many 
more  tests  are  required  to  obtain  a  further  decrease. 

As  might  be  expected,  in  case  (i)  above  most  bugs  are  difficult  to 
detect,  in  case  (ii)  bugs  are  evenly  distributed,  and  in  case  (iii)  bugs  are 
either  very  easy  or  very  difficult  to  detect. 


Program  bugs  have  been  divided  into  four,  and  in  some  cases  five, 
classes,  indicated  by  k  in  the  following 

is  the  probability  of  detecting  a  class  k  bug 

is  the  probability  that  a  user  has  introduced  a  class  k  bug 


is  the  number  of  possible  class  k  bugs. 
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PROBABILITY  OF  ERROR 
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Cast:  of  detection,  as  reflected  in  Q,  ,  depends  largely  on  the  number  of 
inputs  a  bug  effects. 


4.2  Optimum  testing  Strategy  (in  Section  4.1)  Even  though  the 
inputs’  t.d  "be"  tested  are" to  lie  selected  "randomly ,  an  optimum  strategy  exists 
in  that  the  probabilities  of  testing  various  inputs  can  be  chosen  so  as  to 
minimize  P  .  The  asymptotic  form  expressed  by  Eg.  (27)  can  be  rewritten 


(32) 


The  quantities  {)■  and  Q.  are  fixed  by  the  problem,  but  the  s^  can  be  chos¬ 
en  to  minimize  P  ,  at  least  insofar  as  they  depend  on  the  s.  (see  Eq. 
(29)).  e 

The  optimum  s„  can  easily  be  found  for  the  special  case  where 
<jjj  =  6jj,  i.e.,  where  each  bug  affects  only  one  input.  The  problem  is  to 

minimize  the  asymptotic  form  (with  some  additional  approximations): 

N  — s  •  t 
f  =  1  p^e 
i+1  1  1 

while  Keeping  constant 


I 

j 

1 


I 


N 

l  =  g  =  i  s. 
i=l  1 

The  method  of  Lagrange  multipliers  gives  the  equation 


0  =  —  +  A  — 2  =  -  p,  q,  te"  skt  +  k 

9sk  8sk 

After  eliminating  A  to  insure  that  the  Sj  sum  to  unity,  this  gives 

sk  =  S,i,«  <33) 

where 

1  N 

£n  c  =  "  N  .1  £n  piqi 

This  solution  shows  that  if  a  small  number  of  tests  are  made,  the  tester 
should  favor  the  inputs  most  likely  to  be  used  and  most  likely  to  be  in 
error,  but  that  if  a  large  number  of  tests  are  to  be  made  the  tester  should 
select  the  inputs  with  an  essentially  uniform  distribution.  Even  where  a 
non-uniform  distribution  of  tests  is  indicated  by  Eq.  (33),  the  tester  should 
distribute  his  tests  more  uniformly  than  the  user  because  of  the  logarithm 
(assuming  the  do  not  deviate  too  markedly  from  uniformity). 

The  optimum  testing  strategy  when  Oj.  is  a  delta  functon  (each  bug 

affecting  only  one  input)  can  be  quite  misleading  for  the  more  general  case. 
It  may  be  quite  impossible  to  make  the  Sj  in  Eq.  (32)  either  independent  of 

i,  or  (even  logarithmically)  proportional  to  the  Q. .  Also,  if  one  returns  to 

the  more  general  Eq.  (25)  or  Eq.  (17)  and  attempts  to  choose  the  s£  to 

minimize  P„  or  FL,  it  is  found  that  the  best  value  for  s„  is  either  zero  or 
e  e'  £ 

one  and  that  these  values  depend  on  an  unrealistically  detailed  knowledge  of 

a^.  This  is  really  the  same  as  the  optimum  deterministic  testing  discussed 

in  Section  3.5. 

5.0.  Alternative  Definitions  of  Error 


The  definition  of  error  which  was  used  above  may  not  be  suitable  for 
all  purposes,  but  related  probabilities  can  easily  be  calculated  from  the 
equations  given.  Define  two  events  as  follows: 

E  is  the  event  that  a  user  employs  the  module  once  and  encounters  a 
failure? 
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E  is  th<  r\  en t  that  <1  tester  operates  the  module  t  times  and  does  not 

encounter  a  failure,  i.e.,  the  teslei  accepts  the  module.  The  subscript  m 
is  mnemonic  tor  "missed/1  since-  bugs  are  always  assumed  to  be  present  with 
some  probability,  and  there  lore  It  }  is  the  probability  that  the  tester 

has  missed  d!  buys,  i.e.,  incorrectly  accepted  the  module. 

I'htve  Oil  b  rent  quantities  which  might  be  interpi  eted  as  the  probabil¬ 
ity  oi  user  error  are  tabulated  below  (Table  I)  for  the  various  models  which 
have  been  analyzed.  The  effectiveness  of  the  testing  process  can  be 
judged  by  comparing  the  first  with  the  last  two.  Note  that  the  last, 
Pr { E  | E  }  is  alwavs  greater  than  the  second,  Pr{hL,Km}  =  Pp.  This  tact 

might  lead  to  ■  *  n-ac-  of  i  ondition.il  probability,  Pr;  1.1} ,  in  some  cases 
where  P  is  due  to  tester  rejection  and  where  there  is  a  large  proba¬ 

bility  of  module  bug 

One  is  tempted  to  regard  the  goal  here  as  minimization  of  Pr{E  }, 

regarding  the  avoidance  of  user  error  as  of  most  importance  to  the  user. 
But  remember  this  report  seeks  an  optimum  testing  method,  and  that  this  is 
different  trom  and  does  not  preclude  previously  using  an  optimum  program¬ 
ming  method.  The  latter  will  minimize  user  error,  but  will  not  generally 
remove  the  need  for  testing  in  addition. 

6.0  Conclusions  and  Comments 


It  is  believed  that  defining  what  is  meant  by  the  probability  of  a  pro¬ 
gram  error,  and  presenting  a  model  which  permits  its  exact  calculation 
(Eq.  (15))  will  provide  the  nucleus  around  which  a  theory  of  software  reli¬ 
ability  can  be  built.  The  purpose  is  not  merely  to  get  a  formula  into  which 
numbers  can  be  plugged  to  give  a  probability  of  error  --  this  does  nothing 
to  reduce  errors.  Rather  it.  is  anticipated  that  by  classifying  different 
types  of  bugs,  errors  and  test  results,  and  by  showing  how  they  interact 
programming  systems  can  be  improved  and  optimum  testing  procedures  can 
be  found. 

Most  reports  on  program  testing  select  the  test  data  so  as  to  traverse 

each  path  in  the  program  at  least  once.^-^l  The  approach  described  in 
this  report  might  seem  to  be  directed  at  selecting  test  data  according  to  the 
problem  specification.  A  near-optimum  strategy  may  bo  to  select  test  data 
randomly,  since  this  seems  to  be  a  fairly  efficient  solution  to  the  covering 
problem,  but  to  also  make  sure  some  lest  points  are  in  various  distinct  data 
2 

domains  (e.g.,  B  -  4AC  <  0)  and  to  also  make  sure  each  program  path  is 
traversed.  In  addition,  it  is  desirable  to  concentrate  test  data  at  points 
known  to  be  more  likely  to  be  selected  by  the  user.  These  requirements 

are  often  contradictory,  but  optimum  compromises  are  suggested  by 
Eq.  (11),  Section  3.5,  and  Section  4.2  above  (provided  the.  stated  assump- 
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tions  are  met  and  the  required  parameters  are  available).  Also  formulas 
derived  give  some  idea  of  how  many  tests  need  to  be  made  to  achieve  a 
given  reliability. 


If  further  work  is  to  be  done  along  the  lines  of  this  report  it  is  sug¬ 
gested: 

1 )  Data  be  gathered  so  as  to  verify  the  derived  relations  be¬ 
tween  the  number  of  tests  and  the  error  probability,  espec¬ 
ially  as  in  Section  4.0. 

2)  Another  model  can  be  developed  in  which  a  partial  knowledge 
of  the  bug-input  matrix  o  is  assumed.  This  would  lead  both 
to  a  more  easily  applied  optimum  lesting  strategy,  and  to  a 
method  of  sequential  testing  in  which  the  last  test  points  are 
chosen  according  to  the  results  of  the  first  tesi. 
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1'abie  1  summarizes  the  models  disai-.sed. 


Approximation 


MISSION 
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Rome  Air  Development  Center 

RA VC  plank  and  executes  research,  development,  tut  and 
selected  acquisition  programs  in  support  of  Command,  Control 
Communications  and  Intelligence  (C3 1)  activities.  Technical 
and  engineering  support  uiithin  areas  of  technical  competence 
Is  provided  to  ESV  Program  0 6  faces  {POs)  and  other.  ESD 
elements.  The  principal  technical  mission  areas  axe 
communications,  electromagnetic  guidance  and  control,  sur¬ 
veillance  of  ground  and  aerospace  objects,  intelligence  data 
collection  arid  handling,  information  system  technology, 
ionospheric  propagation,  solid  state  sciences,  microuiave 
physics  and  electronic  reliability,  maintainability  and 
compatibility. 


